35 research outputs found

    A Unifying Framework in Vector-valued Reproducing Kernel Hilbert Spaces for Manifold Regularization and Co-Regularized Multi-view Learning

    Get PDF
    This paper presents a general vector-valued reproducing kernel Hilbert spaces (RKHS) framework for the problem of learning an unknown functional dependency between a structured input space and a structured output space. Our formulation encompasses both Vector-valued Manifold Regularization and Co-regularized Multi-view Learning, providing in particular a unifying framework linking these two important learning approaches. In the case of the least square loss function, we provide a closed form solution, which is obtained by solving a system of linear equations. In the case of Support Vector Machine (SVM) classification, our formulation generalizes in particular both the binary Laplacian SVM to the multi-class, multi-view settings and the multi-class Simplex Cone SVM to the semi-supervised, multi-view settings. The solution is obtained by solving a single quadratic optimization problem, as in standard SVM, via the Sequential Minimal Optimization (SMO) approach. Empirical results obtained on the task of object recognition, using several challenging datasets, demonstrate the competitiveness of our algorithms compared with other state-of-the-art methods.Comment: 72 page

    Self-taught Object Localization with Deep Networks

    Full text link
    This paper introduces self-taught object localization, a novel approach that leverages deep convolutional networks trained for whole-image recognition to localize objects in images without additional human supervision, i.e., without using any ground-truth bounding boxes for training. The key idea is to analyze the change in the recognition scores when artificially masking out different regions of the image. The masking out of a region that includes the object typically causes a significant drop in recognition score. This idea is embedded into an agglomerative clustering technique that generates self-taught localization hypotheses. Our object localization scheme outperforms existing proposal methods in both precision and recall for small number of subwindow proposals (e.g., on ILSVRC-2012 it produces a relative gain of 23.4% over the state-of-the-art for top-1 hypothesis). Furthermore, our experiments show that the annotations automatically-generated by our method can be used to train object detectors yielding recognition results remarkably close to those obtained by training on manually-annotated bounding boxes.Comment: WACV 201

    BEYOND MULTI-TARGET TRACKING: STATISTICAL PATTERN ANALYSIS OF PEOPLE AND GROUPS

    Get PDF
    Ogni giorno milioni e milioni di videocamere monitorano la vita quotidiana delle persone, registrando e collezionando una grande quantit\ue0 di dati. Questi dati possono essere molto utili per scopi di video-sorveglianza: dalla rilevazione di comportamenti anomali all'analisi del traffico urbano nelle strade. Tuttavia i dati collezionati vengono usati raramente, in quanto non \ue8 pensabile che un operatore umano riesca a esaminare manualmente e prestare attenzione a una tale quantit\ue0 di dati simultaneamente. Per questo motivo, negli ultimi anni si \ue8 verificato un incremento della richiesta di strumenti per l'analisi automatica di dati acquisiti da sistemi di video-sorveglianza in modo da estrarre informazione di pi\uf9 alto livello (per esempio, John, Sam e Anne stanno camminando in gruppo al parco giochi vicino alla stazione) a partire dai dati a disposizione che sono solitamente a basso livello e ridondati (per esempio, una sequenza di immagini). L'obiettivo principale di questa tesi \ue8 quello di proporre soluzioni e algoritmi automatici che permettono di estrarre informazione ad alto livello da una zona di interesse che viene monitorata da telecamere. Cos\uec i dati sono rappresentati in modo da essere facilmente interpretabili e analizzabili da qualsiasi persona. In particolare, questo lavoro \ue8 focalizzato sull'analisi di persone e i loro comportamenti sociali collettivi. Il titolo della tesi, beyond multi-target tracking, evidenzia lo scopo del lavoro: tutti i metodi proposti in questa tesi che si andranno ad analizzare hanno come comune denominatore il target tracking. Inoltre andremo oltre le tecniche standard per arrivare a una rappresentazione del dato a pi\uf9 alto livello. Per prima cosa, analizzeremo il problema del target tracking in quanto \ue8 alle basi di questo lavoro. In pratica, target tracking significa stimare la posizione di ogni oggetto di interesse in un immagine e la sua traiettoria nel tempo. Analizzeremo il problema da due prospettive complementari: 1) il punto di vista ingegneristico, dove l'obiettivo \ue8 quello di creare algoritmi che ottengono i risultati migliori per il problema in esame. 2) Il punto di vista della neuroscienza: motivati dalle teorie che cercano di spiegare il funzionamento del sistema percettivo umano, proporremo in modello attenzionale per tracking e il riconoscimento di oggetti e persone. Il secondo problema che andremo a esplorare sar\ue0 l'estensione del tracking alla situazione dove pi\uf9 telecamere sono disponibili. L'obiettivo \ue8 quello di mantenere un identificatore univoco per ogni persona nell'intera rete di telecamere. In altre parole, si vuole riconoscere gli individui che vengono monitorati in posizioni e telecamere diverse considerando un database di candidati. Tale problema \ue8 chiamato in letteratura re-indetificazione di persone. In questa tesi, proporremo un modello standard di come affrontare il problema. In questo modello, presenteremo dei nuovi descrittori di aspetto degli individui, in quanto giocano un ruolo importante allo scopo di ottenere i risultati migliori. Infine raggiungeremo il livello pi\uf9 alto di rappresentazione dei dati che viene affrontato in questa tesi, che \ue8 l'analisi di interazioni sociali tra persone. In particolare, ci focalizzeremo in un tipo specifico di interazione: il raggruppamento di persone. Proporremo dei metodi di visione computazionale che sfruttano nozioni di psicologia sociale per rilevare gruppi di persone. Inoltre, analizzeremo due modelli probabilistici che affrontano il problema di tracking (congiunto) di gruppi e individui.Every day millions and millions of surveillance cameras monitor the world, recording and collecting huge amount of data. The collected data can be extremely useful: from the behavior analysis to prevent unpleasant events, to the analysis of the traffic. However, these valuable data is seldom used, because of the amount of information that the human operator has to manually attend and examine. It would be like looking for a needle in the haystack. The automatic analysis of data is becoming mandatory for extracting summarized high-level information (e.g., John, Sam and Anne are walking together in group at the playground near the station) from the available redundant low-level data (e.g., an image sequence). The main goal of this thesis is to propose solutions and automatic algorithms that perform high-level analysis of a camera-monitored environment. In this way, the data are summarized in a high-level representation for a better understanding. In particular, this work is focused on the analysis of moving people and their collective behaviors. The title of the thesis, beyond multi-target tracking, mirrors the purpose of the work: we will propose methods that have the target tracking as common denominator, and go beyond the standard techniques in order to provide a high-level description of the data. First, we investigate the target tracking problem as it is the basis of all the next work. Target tracking estimates the position of each target in the image and its trajectory over time. We analyze the problem from two complementary perspectives: 1) the engineering point of view, where we deal with problem in order to obtain the best results in terms of accuracy and performance. 2) The neuroscience point of view, where we propose an attentional model for tracking and recognition of objects and people, motivated by theories of the human perceptual system. Second, target tracking is extended to the camera network case, where the goal is to keep a unique identifier for each person in the whole network, i.e., to perform person re-identification. The goal is to recognize individuals in diverse locations over different non-overlapping camera views or also the same camera, considering a large set of candidates. In this context, we propose a pipeline and appearance-based descriptors that enable us to define in a proper way the problem and to reach the-state-of-the-art results. Finally, the higher level of description investigated in this thesis is the analysis (discovery and tracking) of social interaction between people. In particular, we focus on finding small groups of people. We introduce methods that embed notions of social psychology into computer vision algorithms. Then, we extend the detection of social interaction over time, proposing novel probabilistic models that deal with (joint) individual-group tracking

    Symmetry-Driven Accumulation of Local Features for Human Characterization and Re-identification

    Get PDF
    This work proposes a method to characterize the appearance of individuals exploiting body visual cues.The method is based on a symmetry-driven appearance-based descriptor and a matching policy that allows to recognize an individual.The descriptor encodes three complementary visual characteristics of the human appearance: theoverall chromatic content, the spatial arrangement of colors intostable regions, and the presence of recurrent local motifs with highentropy. The characteristics are extracted by following symmetry and asymmetryperceptual principles, that allow to segregate meaningful body parts and to focus on the human body only, pruning out the background clutter.The descriptor exploits the case where we have a single image of the individual, as so as the eventuality that multiple pictures of the same identity are available, as in a tracking scenario.The descriptor is dubbed Symmetry-Driven Accumulation of LocalFeatures (SDALF).Our approach is applied to two different scenarios: re-identification and multi-target tracking.In the former, we show the capabilities of SDALF in encoding peculiar aspects of an individual, focusing on its robustness properties across dramatic low resolution images, in presence of occlusions and pose changes, and variations of viewpoints and scene illumination.SDALF has been tested on various benchmark datasets, obtaining in general convincing performances, and setting the state of the art in some cases.The latter scenario shows the benefits of using SDALF as observation model for different trackers, boosting their performances under different respects on the CAVIAR dataset

    Joint Individual-Group Modeling for Tracking

    Get PDF
    We present a novel probabilistic framework that jointly models individuals and groups for tracking. Managing groups is challenging, primarily because of their nonlinear dynamics and complex layout which lead to repeated splitting and merging events. The proposed approach assumes a tight relation of mutual support between the modeling of individuals and groups, promoting the idea that groups are better modeled if individuals are considered and vice versa. This concept is translated in a mathematical model using a decentralized particle filtering framework which deals with a joint individual-group state space. The model factorizes the joint space into two dependent subspaces, where individuals and groups share the knowledge of the joint individual-group distribution. The assignment of people to the different groups (and thus group initialization, split and merge) is implemented by two alternative strategies: using classifiers trained beforehand on statistics of group configurations, and through online learning of a Dirichlet process mixture model, assuming that no training data is available before tracking. These strategies lead to two different methods that can be used on top of any person detector (simulated using the ground truth in our experiments). We provide convincing results on two recent challenging tracking benchmarks

    Chapter 3 - Group Detection and Tracking Using Sociological Features

    No full text
    This chapter describes the most common features and definitions from the sociological science used to detect and track groups of people that are interacting. The necessity of having reliable algorithms to cope with these problems is gaining increasing interest, especially in the fields related to security and video surveillance. Answering the question of “who is present and with whom he/she is interacting in a scene?” is nowadays of utmost importance. Other domains require having good algorithms to face these problems, for example, activity recognition, social robotics, and automatic behavior analysis. The success of detection and tracking algorithms relies on the engineering of the features. In this context, the literature of sociological sciences gives us a set of well-established assumptions and constraints to create more reliable and plausible features and detection algorithms. In this chapter we will describe the existing features of the following two categories: the low-level category used to determine the spatial properties of each person in a scene (person position and head/body orientation), and the high-level category that agglomerates or uses the low-level features to implement sociological and biological definitions (frustum of visual attention). We will see how these features are used by the popular methods of group detection, such as game theory-based and probabilistic approaches. Finally, we will analyze a tracking model that can be integrated with the analyzed features and the described detection methods. The experimental part provides a comprehensive comparison of the performances of different algorithms to detect and track groups on standard and publicly available benchmarks

    Decentralized particle filter for joint individual-group tracking

    No full text
    In this paper, we address the task of tracking groups of people in surveillance scenarios. This is a major challenge in computer vision, since groups are structured entities, subjected to repeated split and merge events. Our solution is a joint individual-group tracking framework, inspired by a recent technique dubbed decentralized particle filtering. The proposed strategy factorizes the joint individual-group state space in two dependent subspaces where individuals and groups share the knowledge of the joint individual-group distribution. In practice, we establish a tight relation of mutual support between the modeling of individuals and that of groups, promoting the idea that groups are better tracked if individuals are considered, and viceversa. Extensive experiments on a published and novel dataset validate our intuition, opening up to many future developments

    A comparison of multi hypothesis kalman filter and particle filter for multi-target tracking

    No full text
    Visual tracking of multiple targets is a key step in surveillance scenarios, far from being solved due to its intrinsic ill-posed nature. In this paper, a comparison of Multi-Hypothesis Kalman Filter and Particle Filter-based tracking is presented. Both methods receive input from a novel online background subtraction algorithm. The aim of this work is to highlight advantages and disadvantages of such tracking techniques. Results are performed using public challenging data set (PETS 2009), in order to evaluate the approaches on significant benchmark data

    Collaborative Particle Filters for Group Tracking

    No full text
    Tracking groups of people is a highly informative task in surveillance, and it represents a still open and little explored issue. In this paper, we propose a brand new framework for group tracking, that consists in two separate particle filters, one focusing on groups as atomic entities (the multi-group tracker), and the other modeling each individual separately (the multi-object tracker). The latter helps the multi-group tracker in better defining the nature of a group, evaluating the membership of each individual with respect to different groups, and allowing a robust management of the occlusions. The coupling of the two processes is theoretically founded due to the revision of the posterior distribution of the multi-group tracker with the statistics accumulated by the multi-object tracker. Experimental comparative results certify the goodness of the proposed technique
    corecore